Assessing Learning Paradigms in Text Classification

نویسندگان

MOHAMMED ABDUL WAJEED

MOHAMMAD ABDUL RAHMAN

چکیده

Today abundant information is available due to the advent of Internet, which is usually stored with sole purpose of current needs alone. Such data thus rest in unclassified in dump repository. Instead if it would be stored in a classified repository then navigation could be done easily, or classified at the later stage reaching it could become easier and thus could helpful in decision making. In the process of classification, commonly supervised and unsupervised paradigm is adopted. Semi-supervised is a new term which is inbetween supervised and unsupervised learning where in-addition to the unlabeled data, the algorithm is provided with some supervision information but not necessarily for all example data. A blend of supervised and unsupervised classification is explored in the formation of fuzzy clusters based on the importance of the terms in each class. Enhancements in traditional KNN algorithm is explored taking into consideration the different weights for the features based on the concept of variance in each class. Finally the results obtained in supervised paradigm and semi-supervised paradigm is compared.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

[Socioeconomic status and health: a discussion of two paradigms].

Socioeconomic status and its impact on health are in the mainstream of public health thinking. This text discusses two paradigms utilized in assessing socioeconomic status in epidemiologic studies. One paradigm refers to prestige-based measurements and positive differentiation among social strata. This paradigm is characterized by classifications assessing social capital and the access to goods...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

Towards Multi Label Text Classification through Label Propagation

Classifying text data has been an active area of research for a long time. Text document is multifaceted object and often inherently ambiguous by nature. Multi-label learning deals with such ambiguous object. Classification of such ambiguous text objects often makes task of classifier difficult while assigning relevant classes to input document. Traditional single label and multi class text cla...

متن کامل